Search CORE

17 research outputs found

Testing Properties of Multiple Distributions with Few Samples

Author: Aliakbarpour Maryam
Silwal Sandeep
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. 11th Innovations in Theoretical Computer Science Conference (ITCS 2020)
Publication date: 17/11/2019
Field of study

We propose a new setting for testing properties of distributions while receiving samples from several distributions, but few samples per distribution. Given samples from

s

distributions,

p_1, p_2, \ldots, p_s

, we design testers for the following problems: (1) Uniformity Testing: Testing whether all the

p_i

's are uniform or

\epsilon

-far from being uniform in

\ell_1

-distance (2) Identity Testing: Testing whether all the

p_i

's are equal to an explicitly given distribution

q

\epsilon

-far from

q

\ell_1

-distance, and (3) Closeness Testing: Testing whether all the

p_i

's are equal to a distribution

q

which we have sample access to, or

\epsilon

-far from

q

\ell_1

-distance. By assuming an additional natural condition about the source distributions, we provide sample optimal testers for all of these problems.Comment: ITCS 202

arXiv.org e-Print Archive

Dagstuhl Research Online Publication Server

A Concentration Inequality for the Facility Location Problem

Author: Silwal Sandeep
Publication venue
Publication date: 08/12/2020
Field of study

We give a concentration inequality for a stochastic version of the facility location problem on the plane. We show the objective

C_n(X) = \min_{F \subseteq [0,1]^2} \, |F| + \sum_{x\in X} \min_{f \in F} \| x-f\|

is concentrated in an interval of length

O(n^{1/6})

and

\mathbb{E}[C_n] = \Theta(n^{2/3})

if the input

X

consists of

n

i.i.d. uniform points in the unit square. Our main tool is to use a suitable geometric quantity, previously used in the design of approximation algorithms for the facility location problem, to analyze a martingale process.Comment: 6 pages, 1 figur

arXiv.org e-Print Archive

Property Testing of LP-Type Problems

Author: Epstein Rogers
Silwal Sandeep
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. 47th International Colloquium on Automata, Languages, and Programming (ICALP 2020)
Publication date: 19/11/2019
Field of study

Given query access to a set of constraints S, we wish to quickly check if some objective function ? subject to these constraints is at most a given value k. We approach this problem using the framework of property testing where our goal is to distinguish the case ?(S) ? k from the case that at least an ? fraction of the constraints in S need to be removed for ?(S) ? k to hold. We restrict our attention to the case where (S,?) are LP-Type problems which is a rich family of combinatorial optimization problems with an inherent geometric structure. By utilizing a simple sampling procedure which has been used previously to study these problems, we are able to create property testers for any LP-Type problem whose query complexities are independent of the number of constraints. To the best of our knowledge, this is the first work that connects the area of LP-Type problems and property testing in a systematic way. Among our results are property testers for a variety of LP-Type problems that are new and also problems that have been studied previously such as a tight upper bound on the query complexity of testing clusterability with one cluster considered by Alon, Dar, Parnas, and Ron (FOCS 2000). We also supply a corresponding tight lower bound for this problem and other LP-Type problems using geometric constructions

arXiv.org e-Print Archive

Dagstuhl Research Online Publication Server

Smoothed Analysis of the Condition Number Under Low-Rank Perturbations

Author: Shah Rikhav
Silwal Sandeep
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques (APPROX/RANDOM 2021)
Publication date: 01/01/2021
Field of study

Let

M

be an arbitrary

n

n

matrix of rank

n-k

. We study the condition number of

M

plus a \emph{low-rank} perturbation

UV^T

where

U, V

are

n

k

random Gaussian matrices. Under some necessary assumptions, it is shown that

M+UV^T

is unlikely to have a large condition number. The main advantages of this kind of perturbation over the well-studied dense Gaussian perturbation, where every entry is independently perturbed, is the

O(nk)

cost to store

U,V

and the

O(nk)

increase in time complexity for performing the matrix-vector multiplication

(M+UV^T)x

. This improves the

\Omega(n^2)

space and time complexity increase required by a dense perturbation, which is especially burdensome if

M

is originally sparse. Our results also extend to the case where

U

and

V

have rank larger than

k

and to symmetric and complex settings. We also give an application to linear systems solving and perform some numerical experiments. Lastly, barriers in applying low-rank noise to other problems studied in the smoothed analysis framework are discussed

arXiv.org e-Print Archive

Dagstuhl Research Online Publication Server

Faster Linear Algebra for Distance Matrices

Author: Indyk Piotr
Silwal Sandeep
Publication venue
Publication date: 26/10/2022
Field of study

The distance matrix of a dataset

X

n

points with respect to a distance function

f

represents all pairwise distances between points in

X

induced by

f

. Due to their wide applicability, distance matrices and related families of matrices have been the focus of many recent algorithmic works. We continue this line of research and take a broad view of algorithm design for distance matrices with the goal of designing fast algorithms, which are specifically tailored for distance matrices, for fundamental linear algebraic primitives. Our results include efficient algorithms for computing matrix-vector products for a wide class of distance matrices, such as the

\ell_1

metric for which we get a linear runtime, as well as an

\Omega(n^2)

lower bound for any algorithm which computes a matrix-vector product for the

\ell_{\infty}

case, showing a separation between the

\ell_1

and the

\ell_{\infty}

metrics. Our upper bound results, in conjunction with recent works on the matrix-vector query model, have many further downstream applications, including the fastest algorithm for computing a relative error low-rank approximation for the distance matrix induced by

\ell_1

and

\ell_2^2

functions and the fastest algorithm for computing an additive error low-rank approximation for the

\ell_2

metric, in addition to applications for fast matrix multiplication among others. We also give algorithms for constructing distance matrices and show that one can construct an approximate

\ell_2

distance matrix in time faster than the bound implied by the Johnson-Lindenstrauss lemma.Comment: Selected as Oral for NeurIPS 202

arXiv.org e-Print Archive

Randomized Dimensionality Reduction for Facility Location and Single-Linkage Clustering

Author: Indyk Piotr
Narayanan Shyam
Silwal Sandeep
Zamir Or
Publication venue
Publication date: 05/07/2021
Field of study

Random dimensionality reduction is a versatile tool for speeding up algorithms for high-dimensional problems. We study its application to two clustering problems: the facility location problem, and the single-linkage hierarchical clustering problem, which is equivalent to computing the minimum spanning tree. We show that if we project the input pointset

X

onto a random

d = O(d_X)

-dimensional subspace (where

d_X

is the doubling dimension of

X

), then the optimum facility location cost in the projected space approximates the original cost up to a constant factor. We show an analogous statement for minimum spanning tree, but with the dimension

d

having an extra

\log \log n

term and the approximation factor being arbitrarily close to

1

. Furthermore, we extend these results to approximating solutions instead of just their costs. Lastly, we provide experimental results to validate the quality of solutions and the speedup due to the dimensionality reduction. Unlike several previous papers studying this approach in the context of

k

-means and

k

-medians, our dimension bound does not depend on the number of clusters but only on the intrinsic dimensionality of

X

.Comment: 25 pages. Published as a conference paper in ICML 202

arXiv.org e-Print Archive

DSpace@MIT